An Introduction to Image Segmentation

Image segmentation is the process of partitioning an image into several coherent sub-regions according to some extracted features (e.g., color, brightness, texture attributes). It plays an important role in a broad range of applications, including video surveillance, augmented reality, recognition tasks (face, fingerprints, iris, etc.), autonomous vehicles (navigable surface, object detection, etc.) or medical image analyse, just to name a few. Segmentation aims to extract meaningful information for easier and enhanced analysis; as an illustrative example, a diagnosis based on a set of features extracted from a segmented CT scan of a lung will most probably provide more accurate results than another one where the lung is not isolated.

Many distinct segmentation methods and algorithms have been discussed in the literature, and there are several ways to classify them. For instance, they can be classified into those: semantic segmentation, which performs pixel-level labeling with a set of object categories (e.g., car, human) for all image pixels, and instance segmentation, which extends semantic segmentation scope further by detecting and delineating each object of interest in the image. Also, there is another type called panoptic segmentation which is the unified version of the two basic segmentation processes.

An example of diferent types of image segmentation.

An example of diferent types of image segmentation.

They can also be split into three categories: manual, semiautomatic, and automatic. While an advantage of manual segmentation method is that we can use expert knowledge (as it is usually performed by an expert), its drawbacks are that it is very time consuming and prone to intra and interobserver variability, which can result in a large difference in the extracted features.

Semiautomatic segmentation tries to solve some of the problems related to manual segmentation; by using algorithms, the effort and time spent by the user can be reduced. Semiautomatic algorithms aim to reduce the inter and intraobserver variability. However, interobserver variability will still be present, as the manual part of the segmentation and the settings of the algorithm influence the result.

Automatic segmentation methods do not rely on user interaction, and can be split into two categories: learning and nonlearning-based. Deep learning recently has become very popular as a learning-based method, where the segmentation is performed by a neural network that is trained with labeled examples. The advantage of automatic segmentation is that once the method has been constructed, the segmentations can be performed relatively quickly. Moreover, they produce segmentations that are consistent and reproducible. The disadvantage of deep learning is that it usually requires a lot of labeled data to train an accurate model, as well as a long training time and specialized hardware to construct the model.

According to its architecture, image segmentation algorithms can be grouped into different types, such as:

DICOM Processing in Python

DICOM (Digital Imaging and Communications in Medicine) is a standard protocol for the management and transmission of medical images and related data and is used in many healthcare facilities. This section is dedicated to visualizing Achilles tendon CT images (DICOM files) in 1D (by histogram), 2D, and 3D. Finally, we will create segmentation masks that remove all voxels except for the Achilles tendon. In order to achieve this, we’ll run a Python code based on the one from [2] in RStudio using the library reticulate.

Getting Ready

The reticulate package provides a comprehensive set of tools for interoperability between Python and R. Thus, we’ll start by installing and loading it (see [3] and [4]). Now we are almost ready to run Python code in RStudio. Before doing this, we must install some Python modules (if they have not been installed yet) by running the following command lines (in Terminal):

pip install pydicom

conda install plotly

conda install scikit-image (make sure you are using scikit-image 0.13 or above)

pip install sklearn

pip install matplotlib

Import packages

Once we are working in a Python environment, let us begin by importing the following packages:

import numpy as np
import pydicom
import os
import matplotlib.pyplot as plt
from glob import glob
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import scipy.ndimage
from skimage import morphology
from skimage import measure
from skimage.transform import resize
from sklearn.cluster import KMeans
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.tools import FigureFactory as FF
from plotly.graph_objs import *
from pydicom.pixel_data_handlers.util import apply_modality_lut
import plotly.figure_factory
import matplotlib

To specify the input and output data folders path:

data_path = "./Images/DICOM/AX DP 1RA68062501 (id=1)/FILESET"
output_path = working_path = "./Images/DICOM/AX DP 1RA68062501 (id=1)"
g = glob(data_path + '/*')
# Print out the first 5 file names to verify we are in the right folder
print ("Total of %d DICOM images.\nFirst 5 filenames:" % len(g))
print ('\n'.join(g[:5]))

Helper Functions

  • load_scan: loads all DICOM images from a folder into a list for manipulation.
  • get_pixels_hu: converts raw voxel and pixel values into Houndsfield Units, which are a relative quantitative measurement of radiodensity used by radiologists in the interpretation of computed tomography (CT) images.
def load_scan(path):
    slices = [pydicom.read_file(path + '/' + s) for s in os.listdir(path)]
    slices.sort(key = lambda x: int(x.InstanceNumber))
    try:
        slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
        slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
        
    for s in slices:
        s.SliceThickness = slice_thickness
        
    return slices

def get_pixels_hu(scans):
    image = np.stack([s.pixel_array for s in scans])
    # Convert to int16 (from sometimes int16), 
    # should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)

    # Set outside-of-scan pixels to 1
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    
    # Convert to Hounsfield units (HU)
    intercept = scans[0].RescaleIntercept if 'RescaleIntercept' in scans[0] else -1024
    slope = scans[0].RescaleSlope if 'RescaleSlope' in scans[0] else 1
    
    image = slope * image.astype(np.float64)
    image = image.astype(np.int16)
    image += np.int16(intercept)
    
    return np.array(image, dtype=np.int16)

id=1
patient = load_scan(data_path)
patient_pixels = get_pixels_hu(patient)

The conversion uses two values of the DICOM files’ metadata, Rescale Intercept and Rescale Slope (see the lines below # Convert to Hounsfield units (HU)), which are usually stored in the DICOM header at the time of image acquisition. Still, they are absent in our dataset. Most CT scans use -1024 for Intercept and 1 for Slope, but they may not be correct in our case. However, we’ll take them to explain how to interpret the results.

The next code line saves the new data to disk so we don’t have to reprocess the stack every time.

np.save(output_path + "/fullimages_%d.npy" % (id), patient_pixels)

Displaying Images

The first thing we should do is to check to see whether the Houndsfield Units are properly scaled and represented. Here there’s a quick list of HU values of a few useful tissues and substances, sourced from Wikipedia.

Some HU values.

Some HU values.

It has to be said that HU values differ depending on the source. Let’s now create a histogram of all the pixel data in the study.

file_used=output_path+"/fullimages_%d.npy" % id
patient_pixels = np.load(file_used).astype(np.float64)

plt.hist(patient_pixels.flatten(), bins=50, color='c')
plt.xlabel("Hounsfield Units (HU)")
plt.ylabel("Frequency")
plt.show()

As said earlier, the conversion may have not been done properly due to the absence of Rescale Intercept and Rescale Slope of the DICOM header. If it were correct, the high part around -1000 would be the background and air; also there would be some soft tissues such as fat, muscle or the Achilles tendon approximately in equal parts (from -500 to 700) and the high values would match bone.

Displaying an Image Stack

We’ll be skipping every 2 slices to get a representative look at the study.

id = 1
imgs_to_process = np.load(output_path+'/fullimages_{}.npy'.format(id))
def sample_stack(stack, rows=4, cols=5, start_with=0, show_every=2):
    fig,ax = matplotlib.pyplot.subplots(rows,cols,figsize=[12,12])
    #matplotlib.pyplot.subplots_adjust(hspace=0.25)
    for i in range(rows*cols):
        ind = start_with + i*show_every
        ax[int(i/cols),int(i % cols)].set_title('slice %d' % ind)
        ax[int(i/cols),int(i % cols)].imshow(stack[ind],cmap='gray')
        ax[int(i/cols),int(i % cols)].axis('off')
    plt.show()
sample_stack(imgs_to_process)

The images are oriented so that the area close to the heel is at the bottom and the area close to the toes is at the top. The Achilles tendon is the dark oval region at the bottom of the slices from 0 to 26.

Resampling

In order to build a 3D plot through the slices it is required to know their thickness. We can obtain it in the DICOM header.

print ("Slice Thickness: %f" % patient[0].SliceThickness)
## Slice Thickness: 2.999637
print ("Pixel Spacing (row, col): (%f, %f) " % (patient[0].PixelSpacing[0], patient[0].PixelSpacing[1]))
## Pixel Spacing (row, col): (0.198900, 0.198900)

This means we have 3 mm slices, and each pixel side is 0.1989 mm long.

Although a CT slice is typically reconstructed at 512 x 512 pixels, ours are 704 x 704, so each slice represents approximately 704 · 0.1989 mm = 140 mm of data in length and width.

In order to display the CT in 3D isometric form, and also to compare between different scans, it would be useful to ensure that each slice is resampled in 1x1x1 mm voxels.

id = 1
imgs_to_process = np.load(output_path+'/fullimages_{}.npy'.format(id))
def resample(image, scan, new_spacing=[1,1,1]):
    # Determine current pixel spacing
    spacing = map(float, ([scan[0].SliceThickness] + list(scan[0].PixelSpacing)))
    spacing = np.array(list(spacing))

    resize_factor = spacing / new_spacing
    new_real_shape = image.shape * resize_factor
    new_shape = np.round(new_real_shape)
    real_resize_factor = new_shape / image.shape
    new_spacing = spacing / real_resize_factor
    
    image = scipy.ndimage.zoom(image, real_resize_factor)
    
    return image, new_spacing

print ("Shape before resampling\t", imgs_to_process.shape)
## Shape before resampling   (40, 704, 704)
imgs_after_resamp, spacing = resample(imgs_to_process, patient, [1,1,1])
print ("Shape after resampling\t", imgs_after_resamp.shape)
## Shape after resampling    (120, 140, 140)

3D Plotting

We now have enough information to plot the DICOM image in 3D space. We are doing it by creating a high-quality static using 3D capability of matplotlib (plt_3d function) and a interactive render using plotly, which has WebGL support via JavaScript (plotly_3d function). The marching cubes algorithm is used to generate a 3D mesh from the dataset.

def make_mesh(image, threshold=300, step_size=1):
    print ("Transposing surface")
    p = image.transpose(2,1,0)
    
    print ("Calculating surface")
    verts, faces, norm, val = measure.marching_cubes(p, threshold, step_size=step_size, allow_degenerate=True) 
    return verts, faces
  
def plotly_3d(verts, faces):
    x,y,z = zip(*verts) 
    
    print ("Drawing")
    
    # Make the colormap single color since the axes are positional not intensity. 
#    colormap=['rgb(255,105,180)','rgb(255,255,51)','rgb(0,191,255)']
    colormap=['rgb(236, 236, 212)','rgb(236, 236, 212)']
    
    fig = plotly.figure_factory.create_trisurf(x=x,
                        y=y, 
                        z=z, 
                        plot_edges=False,
                        colormap=colormap,
                        simplices=faces,
                        backgroundcolor='rgb(64, 64, 64)',
                        title="Interactive Visualization")
    iplot(fig)
    
def plt_3d(verts, faces):
    print ("Drawing")
    x,y,z = zip(*verts) 
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='3d')
    # Fancy indexing: `verts[faces]` to generate a collection of triangles
    mesh = Poly3DCollection(verts[faces], linewidths=0.05, alpha=1)
    face_color = [1, 1, 0.9]
    mesh.set_facecolor(face_color)
    ax.add_collection3d(mesh)
    ax.set_xlim(0, max(x))
    ax.set_ylim(0, max(y))
    ax.set_zlim(0, max(z))
    ax.set_facecolor((0.7, 0.7, 0.7))
    plt.show()
    
v, f = make_mesh(imgs_after_resamp[::-1])
## Transposing surface
## Calculating surface
plt_3d(v, f)
## Drawing

plotly_3d(v, f)
## Drawing

To see the interactive render it’s needed to run the code in the console.

Semi-Automatic Achilles Tendon Segmentation Using 3DSlicer

In order to train an automatic model based on a convolutional neural network we need a set of images already segmented. In this section there is a description of how to segmentate the Achilles tendon from an ankle CT using “a free, open source and multi-platform software package widely used for medical, biomedical, and related imaging research” called 3DSlicer. The following explanation is not aiming to serve as a manual, but to illustrate how semi-automatic segmentation can be performed. Therefore all the issues concerning the specific software way of using will not be specified.

First of all we need to load the data to 3DSlicer. In our case it is a set of 35 DICOM images pertaining to different axial DP slices of the ankle. Once they’ve been loaded, four different windows are displayed. The top-left one shows axial sections (horizontal slices, the images just as they are); the bottom-left one shows coronal sections (divides into ventral and dorsal); the bottom-right shows sagittal sections (divides into right and left), and the top-right builds a 3D volume of the ankle (enabling it at “Volume Rendering” module). At “Segment Editor” module we create three segments: one for the air (clay), one for the Achilles tendon (yellow) and another one for the rest of the ankle (green).

Threshold range coloured in clay

Threshold range coloured in clay

Using the “Paint” tool we plant some seeds for each segment on some slices for each section (4-5 images for each segment and section). Before that, we set a threshold range (from 2 to 1000, for example) to make the task easier since the out-of-range pixels won’t be considered. It’d be advisable to thoughtfully colour the “boundary slices” where the Achilles tendon joins the calcaneus bone.

Planting some seeds

Planting some seeds

After that, we use the “Grow from seeds” tool to classify every single pixel of every single slice in the threshold range into one of the three segments. We check the results and apply some modifications if necessary. We can view a provisional 3D construction of the tendon based on the segmentation so far by clicking on “Show 3D” and hiding the air and ankle segments.

Grown from seeds

Grown from seeds

As you can see, there are some pixels not belonging to the Achilles tendon coloured in yellow and some other belonging that are not coloured (little holes on the axial section yellow area). To correct this we remove the threshold range by disabling “Editable intensity range” and we use the “Erase” and “Smoothing” tools (besides “Paint”) on every axial section image. We choose the “Closing” smoothing method to fill the holes and the “Mean” smoothing method to smooth the edges. We can lower the segment’s colour opacity at the “Segmentations” module to help us hone the results.

Achilles tendon segmentation

Achilles tendon segmentation

We have finally segmented the Achilles tendon. Now we can remove the air and ankle segments and export the segmentation to a labelmap at the “Segmentations” module. 3DSlicer also allows us to get the images’ radiomic features; to do it we switch to “Radiomics” module and generate the table using the labelmap.

Getting the radiomic features

Getting the radiomic features

Image classification from CT scans by training a 3D convolutional neural network

In this section we are implementing a convolutional neural network to classify Achilles tendons into normal or abnormal from 3D CT ankle scans. We are adjusting the code from https://keras.io/examples/vision/3D_image_classification/.

Before we start, we need to import some modules:

import os
import zipfile
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Now we are loading and doing a first preprocessing to our data, which is a set of nifti files:

import nibabel as nib
from scipy import ndimage

def read_nifti_file(filepath):
    """Read and load volume"""
    # Read file
    scan = nib.load(filepath)
    # Get raw data
    scan = scan.get_fdata()
    return scan

def normalize(volume):
    """Normalize the volume"""
    min = -1024
    max = 3000
    volume[volume < min] = min
    volume[volume > max] = max
    volume = (volume - min) / (max - min)
    volume = volume.astype("float32")
    return volume


def resize_volume(img):
    """Resize across z-axis"""
    # Set the desired depth
    desired_depth = 64
    desired_width = 128
    desired_height = 128
    # Get current depth
    current_depth = img.shape[-1]
    current_width = img.shape[0]
    current_height = img.shape[1]
    # Compute depth factor
    depth = current_depth / desired_depth
    width = current_width / desired_width
    height = current_height / desired_height
    depth_factor = 1 / depth
    width_factor = 1 / width
    height_factor = 1 / height
    # Rotate
    img = ndimage.rotate(img, 90, reshape=False)
    # Resize across z-axis
    img = ndimage.zoom(img, (width_factor, height_factor, depth_factor), order=1)
    return img


def process_scan(path):
    """Read and resize volume"""
    # Read scan
    volume = read_nifti_file(path)
    # Normalize
    volume = normalize(volume)
    # Resize width, height and depth
    volume = resize_volume(volume)
    return volume
  
# Train paths

normal_scan_paths_train = [
    os.path.join(os.getcwd(), "./Images/Talons/normal_nii/train", x)
    for x in os.listdir("./Images/Talons/normal_nii/train")
]

abnormal_scan_paths_train = [
    os.path.join(os.getcwd(), "./Images/Talons/abnormal_nii/train", x)
    for x in os.listdir("./Images/Talons/abnormal_nii/train")
]

# Building train dataset

abnormal_scans_train = np.array([process_scan(path) for path in abnormal_scan_paths_train])
normal_scans_train = np.array([process_scan(path) for path in normal_scan_paths_train])

abnormal_labels_train = np.array([1 for _ in range(len(abnormal_scans_train))])
normal_labels_train = np.array([0 for _ in range(len(normal_scans_train))])

x_train = np.concatenate((abnormal_scans_train, normal_scans_train), axis=0)
y_train = np.concatenate((abnormal_labels_train, normal_labels_train), axis=0)

## Building test dataset

#abnormal_scans_test = np.array([process_scan(path) for path in abnormal_scan_paths_test])
#normal_scans_test = np.array([process_scan(path) for path in normal_scan_paths_test])

#abnormal_labels_test = np.array([1 for _ in range(len(abnormal_scans_test))])
#normal_labels_test = np.array([0 for _ in range(len(normal_scans_test))])

#x_test = np.concatenate((abnormal_scans_test, normal_scans_test), axis=0)
#y_test = np.concatenate((abnormal_labels_test, normal_labels_test), axis=0)

In order to get better results, we’ll define a data augmentation by rotating the images at random angles. It will be performed on the fly during training.

import random
from scipy import ndimage

@tf.function
def rotate(volume):
    """Rotate the volume by a few degrees"""

    def scipy_rotate(volume):
        # define some rotation angles
        angles = [-20, -10, -5, 5, 10, 20]
        # pick angles at random
        angle = random.choice(angles)
        # rotate volume
        volume = ndimage.rotate(volume, angle, reshape=False)
        volume[volume < 0] = 0
        volume[volume > 1] = 1
        return volume

    augmented_volume = tf.numpy_function(scipy_rotate, [volume], tf.float32)
    return augmented_volume


def train_preprocessing(volume, label):
    """Process training data by rotating and adding a channel."""
    # Rotate volume
    volume = rotate(volume)
    volume = tf.expand_dims(volume, axis=3)
    return volume, label


#def test_preprocessing(volume, label):
#    """Process validation data by only adding a channel."""
#    volume = tf.expand_dims(volume, axis=3)
#    return volume, label

# Define data loaders
train_loader = tf.data.Dataset.from_tensor_slices((x_train, y_train))
#test_loader = tf.data.Dataset.from_tensor_slices((x_test, y_test))

batch_size = 2

# Augment on the fly during training
train_dataset = (
    train_loader.shuffle(len(x_train))
    .map(train_preprocessing)
    .batch(batch_size)
    .prefetch(2)
)

## Only rescale
#test_dataset = (
#    test_loader.shuffle(len(x_test))
#    .map(test_preprocessing)
#    .batch(batch_size)
#    .prefetch(2)
#)

Now it’s turn to define and build our 3D CNN:

def get_model(width=128, height=128, depth=64):
    """Build a 3D convolutional neural network model."""

    inputs = keras.Input((width, height, depth, 1))

    x = layers.Conv3D(filters=64, kernel_size=3, activation="relu")(inputs)
    x = layers.MaxPool3D(pool_size=2)(x)
    x = layers.BatchNormalization()(x)

    x = layers.Conv3D(filters=64, kernel_size=3, activation="relu")(x)
    x = layers.MaxPool3D(pool_size=2)(x)
    x = layers.BatchNormalization()(x)

    x = layers.Conv3D(filters=128, kernel_size=3, activation="relu")(x)
    x = layers.MaxPool3D(pool_size=2)(x)
    x = layers.BatchNormalization()(x)

    x = layers.Conv3D(filters=256, kernel_size=3, activation="relu")(x)
    x = layers.MaxPool3D(pool_size=2)(x)
    x = layers.BatchNormalization()(x)

    x = layers.GlobalAveragePooling3D()(x)
    x = layers.Dense(units=512, activation="relu")(x)
    x = layers.Dropout(0.3)(x)

    outputs = layers.Dense(units=1, activation="sigmoid")(x)

    # Define the model.
    model = keras.Model(inputs, outputs, name="3dcnn")
    return model
  
# Build model
model = get_model(width=128, height=128, depth=64)
model.summary()

# Compile model
initial_learning_rate = 0.0001
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate, decay_steps=100000, decay_rate=0.96, staircase=True
)
model.compile(
    loss="binary_crossentropy",
    optimizer=keras.optimizers.Adam(learning_rate=lr_schedule),
    metrics=["acc"],
)

# Define callbacks
checkpoint_cb = keras.callbacks.ModelCheckpoint(
    "3d_image_classification.h5", save_best_only=True)

# Train the model
epochs = 100
model.fit(
    train_dataset,
    epochs=epochs,
    shuffle=True,
    verbose=2,
    callbacks=[checkpoint_cb]
)

Once the model is trained we can make predictions to a single CT scan by running the following chunk:

# Load best weights.
model.load_weights("3d_image_classification.h5")
prediction = model.predict(np.expand_dims(PREPROCESSED_CTSCAN, axis=0))[0]
scores = [1 - prediction[0], prediction[0]]

class_names = ["normal", "abnormal"]
for score, name in zip(scores, class_names):
    print(
        "This model is %.2f percent confident that CT scan is %s"
        % ((100 * score), name)
    )

Bibliography

[1]
P. Suetens, R. Verbeeck, D. Delaere, J. Nuyts, and B. Bijnens, “Model-based image segmentation: Methods and applications,” in AIME 91, 1991, pp. 3–24.
[2]
H. Chen, DICOM Processing and Segmentation in Python.” https://www.raddq.com/dicom-processing-segmentation-visualization-in-python/
[3]
K. Ushey, J. Allaire, and Y. Tang, Reticulate: Interface to ’Python. 2022. Available: https://rstudio.github.io/reticulate/
[4]
A. C. Alamo, Como usar python en RStudio. 2020. Available: https://acca3003.medium.com/como-usar-python-en-rstudio-335f953c0efd
[5]
M. P. A. Starmans, S. R. van der Voort, J. M. Castillo Tovar, J. F. Veenland, S. Klein, and W. J. Niessen, “Chapter 18 - Radiomics: Data mining using quantitative medical image features,” in Handbook of medical image computing and computer assisted intervention, S. K. Zhou, D. Rueckert, and G. Fichtinger, Eds. Academic Press, 2020, pp. 429–456. doi: https://doi.org/10.1016/B978-0-12-816176-0.00023-5.
[6]
Farhana Sultana, Abu Sufian, and P. Dutta, “Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey,” 2020.
[7]
Shervin Minaee, Yuri Boykov, Fatih Porikli, Antonio Plaza, Nasser Kehtarnavaz, and D. Terzopoulos, Image Segmentation Using Deep Leanring: A Survey,” 2020.
[8]
K. K. D. Ramesh, G. Kiran Kumar, K. Swapna, Debabrata Datta, and S. S. Rajest, “A Review of Medical Image Segmentation Algorithms,” 2021.
[9]
Hyunseok Seo et al., “Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications (Author Manuscript),” 2020.